In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.
In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.
Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.
Visualize the German Traffic Signs Dataset. This is open ended, some suggestions include: plotting traffic signs images, plotting the count of each sign, etc. Be creative!
The pickled data is a dictionary with 4 key/value pairs:
# Load pickled data
import numpy as np
import pickle
import skimage.transform
from sklearn.preprocessing import LabelBinarizer
%pylab inline
# TODO: fill this in based on where you saved the training and testing data
training_file = './../Downloads/train.p'
testing_file = './../Downloads/test.p'
with open(training_file, mode='rb') as f:
train = pickle.load(f)
with open(testing_file, mode='rb') as f:
test = pickle.load(f)
X_train, y_train = train['features'], train['labels']
X_test, y_test = test['features'], test['labels']
n_train = X_train.shape[0]
n_test = X_test.shape[0]
image_shape = X_train[2].shape[0:]
n_classes = len(set(y_train))
print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
print(X_train.shape)
import pandas as pd
df = pd.read_csv("signnames.csv")
def draw(X, y):
for j in range(3):
figure(figsize=(20,10))
for i in range(5):
subplot(151+i)
title(df.iloc[y[i+j*5]]['SignName'])
imshow(X[i+j*5])
show()
print("Look at Training")
draw(X_train, y_train)
print("Look at Testing")
draw(X_test, y_test)
Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.
There are various aspects to consider when thinking about this problem:
Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.
from skimage.exposure import equalize_hist
from skimage.exposure import equalize_adapthist,adjust_log
X_test = equalize_hist(X_test)
X_train = equalize_hist(X_train)
# Alternative, but slightly worse results better for low lightning
#X_train = [equalize_adapthist(data,kernel_size=32) for data in X_train]
#X_test = [equalize_adapthist(data,kernel_size=32) for data in X_test]
print("Look at Training")
draw(X_train, y_train)
print("Look at Testing")
draw(X_test, y_test)
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
binarizer = LabelBinarizer().fit(y_train)
y_train = binarizer.transform(y_train).astype(np.float32)
y_test = binarizer.transform(y_test).astype(np.float32)
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
X_train, y_train = shuffle(X_train, y_train, random_state=42)
X_v, X_t, y_v, y_t = train_test_split(X_test, y_test, test_size=0.5, random_state=42)
print("Look at Training")
draw(X_train, np.argmax(y_train, axis=1))
print("Much better...")
Describe the techniques used to preprocess the data.
Answer:
I did no do that much preprocessing:
Specifically I did not center the images values around 0 as reccommended sometimes (but not always). I tried both but got very similar results
# Additional Data and iterator functions
# Note: inspired by keras.preprocessing.image
import scipy.ndimage as ndi
import itertools
def apply_transform(x, transform_matrix):
x = np.rollaxis(x, 2, 0)
final_affine_matrix = transform_matrix[:2, :2]
final_offset = transform_matrix[:2, 2]
channel_images = [ndi.interpolation.affine_transform(x_channel, final_affine_matrix,
final_offset, order=0, mode='nearest') for x_channel in x]
x = np.stack(channel_images, axis=0)
x = np.rollaxis(x, 0, 3)
return x
def random_transform(x, rg,height_shift_range,width_shift_range,shear_range,zoom_range):
h, w = x.shape[0], x.shape[1]
theta = np.pi / 180 * np.random.uniform(-rg, rg)
rotation_matrix = np.array([[np.cos(theta), -np.sin(theta), 0],
[np.sin(theta), np.cos(theta), 0],
[0, 0, 1]])
tx = h * np.random.uniform(-height_shift_range, height_shift_range)
ty = w * np.random.uniform(-width_shift_range, width_shift_range)
translation_matrix = np.array([[1, 0, tx],[0, 1, ty],[0, 0, 1]])
shear = np.random.uniform(-shear_range, shear_range) #0.5
shear_matrix = np.array([[1, -np.sin(shear), 0],[0, np.cos(shear), 0],[0, 0, 1]])
zx, zy = np.random.uniform(zoom_range[0], zoom_range[1], 2)
zoom_matrix = np.array([[zx, 0, 0],[0, zy, 0],[0, 0, 1]])
transform_matrix = np.dot(np.dot(np.dot(rotation_matrix, translation_matrix), shear_matrix), zoom_matrix)
transform_matrix = transform_matrix_offset_center(transform_matrix, h, w)
x = apply_transform(x, transform_matrix)
return x
def transform_matrix_offset_center(matrix, x, y):
o_x = float(x) / 2 + 0.5
o_y = float(y) / 2 + 0.5
offset_matrix = np.array([[1, 0, o_x], [0, 1, o_y], [0, 0, 1]])
reset_matrix = np.array([[1, 0, -o_x], [0, 1, -o_y], [0, 0, 1]])
transform_matrix = np.dot(np.dot(offset_matrix, matrix), reset_matrix)
return transform_matrix
def trans(X):
for element in itertools.cycle(X):
yield random_transform(element, rg=0.2,height_shift_range=0.1,width_shift_range=0.1,shear_range=0.5,zoom_range=(0.9,1.1))
def forward(X):
for element in itertools.cycle(X):
yield element
Xtrain_iterator = trans(X_train)
ytrain_iterator = forward(y_train)
Xv_iterator = forward(X_v)
Xt_iterator = forward(X_t)
yv_iterator = forward(y_v)
yt_iterator = forward(y_t)
Describe how you set up the training, validation and testing data for your model. If you generated additional data, why?
Answer:
I generated training data on the fly (setup a pipeline inspired by Keras' ImageDataGenerator (i also took their concept of using ndimage and matrix transforms instead of a real image library). This way the training loop does not see the exact same image (but instead the slightly jittered one). This was the biggest improvement i made to get to the 98% vs 95% I split the test data 50%/50% into validation and test
import tensorflow as tf
#batch_size = 256
patch_size = 3
depth = 10
num_hidden = 512
tf_data = tf.placeholder(tf.float32, shape=(None, 32, 32, 3))
tf_labels = tf.placeholder(tf.float32, shape=(None, n_classes))
tf_keep = tf.placeholder(tf.float32)
# Variables.
conv1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, 3, 32], stddev=0.1))
#conv1_biases = tf.Variable(tf.zeros([32])) # I dont use these anymore
conv2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, 32, 32], stddev=0.1))
#conv2_biases = tf.Variable(tf.zeros([32]))
conv3_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, 32, 64], stddev=0.1))
#conv3_biases = tf.Variable(tf.zeros([64]))
conv4_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, 64, 64], stddev=0.1))
#conv4_biases = tf.Variable(tf.zeros([64]))
conv5_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, 64, 128], stddev=0.1))
conv5_biases = tf.Variable(tf.zeros([128]))
conv6_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, 128, 128], stddev=0.1))
conv6_biases = tf.Variable(tf.zeros([128]))
layer3_weights = tf.Variable(tf.truncated_normal([2048, num_hidden], stddev=0.1))
layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
layer4_weights = tf.Variable(tf.truncated_normal([num_hidden, n_classes], stddev=0.1))
layer4_biases = tf.Variable(tf.constant(1.0, shape=[n_classes]))
def model(data):
#cov
data = tf.reshape(data, [-1,32,32,3])
conv = tf.nn.conv2d(data, conv1_weights, [1, 1, 1, 1], padding='SAME')
#conv = tf.nn.bias_add(conv, conv1_biases)
conv = tf.nn.relu(conv)
conv = tf.nn.conv2d(conv, conv2_weights, [1, 1, 1, 1], padding='SAME')
#conv = tf.nn.bias_add(conv, conv2_biases)
conv = tf.nn.max_pool(conv, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],padding='SAME')
conv = tf.nn.conv2d(conv, conv3_weights, [1, 1, 1, 1], padding='SAME')
#conv = tf.nn.bias_add(conv, conv3_biases)
conv = tf.nn.relu(conv)
conv = tf.nn.conv2d(conv, conv4_weights, [1, 1, 1, 1], padding='SAME')
#conv = tf.nn.bias_add(conv, conv4_biases)
conv = tf.nn.max_pool(conv, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],padding='SAME')
conv = tf.nn.conv2d(conv, conv5_weights, [1, 1, 1, 1], padding='SAME')
conv = tf.nn.relu(conv)
conv = tf.nn.conv2d(conv, conv6_weights, [1, 1, 1, 1], padding='SAME')
conv = tf.nn.max_pool(conv, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],padding='SAME')
shape = conv.get_shape().as_list()
reshape = tf.reshape(conv, [-1,2048])
reshape = tf.nn.dropout(reshape, tf_keep)
hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
hidden_drop = tf.nn.dropout(hidden, tf_keep)
return tf.matmul(hidden_drop, layer4_weights) + layer4_biases
logits = model(tf_data)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_labels))
prediction = tf.nn.softmax(logits)
optimizer = tf.train.AdamOptimizer(0.001).minimize(loss)
correct_prediction = tf.equal(tf.argmax(prediction,1), tf.argmax(tf_labels,1))
tf_accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.) For reference on how to build a deep neural network using TensorFlow, see Deep Neural Network in TensorFlow from the classroom.
Answer:
My model is a 6 layer CNN with 3x3 patches and dropout layers every two CNN layers, increasing the depth from 3 to 32 to 64 to 128 and finally a dropout layer before a dense layer with 512 nodes another dropout layer and the output. This seems very standard besides the fact that I do not use biases - with bias my val accuracies went down to smaller than 98%
I tried a lot of more architectures but this got the best result. I would be very interested I better designs though! Please comment on how to get the 99% mentioned in some papers i tried some of those architectures in the papers but did not get good results (about 95%)
batch_size = 256
epoch = 100
num_steps = int(np.ceil(n_train / batch_size))
num_steps_val = int(np.ceil(n_test / batch_size / 2))
num_steps_test = int(np.ceil(n_test / batch_size / 2))
def bar(i):
return "".join((["["]+["="]*int(i/5+1) + [" "]*(20-int(i/5+1))+["]"])) + " %03d%%"%(i+1)
Xtrain_iterator = trans(X_train)
ytrain_iterator = forward(y_train)
Xv_iterator = forward(X_v)
Xt_iterator = forward(X_t)
yv_iterator = forward(y_v)
yt_iterator = forward(y_t)
with tf.Session() as session:
tf.initialize_all_variables().run()
for i in range(epoch):
loss1 = []
loss2 = []
acc1 = []
acc2 = []
for j in range(num_steps):
print("\r %d/%d"%(j,num_steps), end="")
print(bar( int(j/num_steps*100) ), end="")
batch_data = np.array([Xtrain_iterator.__next__() for i in range(batch_size)])
batch_labels = np.array([ytrain_iterator.__next__() for i in range(batch_size)])
feed_dict = {tf_data : batch_data, tf_labels : batch_labels, tf_keep:0.4}
_, l, predictions,accuracy = session.run([optimizer, loss, prediction,tf_accuracy ], feed_dict=feed_dict)
loss1.append(l)
acc1.append(accuracy)
for j in range(num_steps_val):
batch_data = np.array([Xv_iterator.__next__() for i in range(batch_size)])
batch_labels = np.array([yv_iterator.__next__() for i in range(batch_size)])
feed_dict = {tf_data : batch_data, tf_labels : batch_labels, tf_keep:1.0}
l, predictions,accuracy = session.run([loss, prediction,tf_accuracy ], feed_dict=feed_dict)
loss2.append(l)
acc2.append(accuracy)
print("")
print("Epoch %d/%d" % (i+1,epoch))
print("Train Loss: %f Validation Loss: %f" % (np.mean(np.array(loss1)), np.mean(np.array(loss2))))
print("Train Acc: %f Validation Acc: %f" % (np.mean(np.array(acc1)), np.mean(np.array(acc2))))
saver = tf.train.Saver()
save_path = saver.save(session, "./model-final.ckpt")
Typical result after 100 Epoch:
Lets have a look at the test data next
loss_array = []
acc_array = []
f = []
p = []
with tf.Session() as sess:
saver = tf.train.Saver()
saver.restore(sess, "./model-final.ckpt")
print("Model restored.")
for i in range(np.ceil(len(X_t)/1000).astype(np.int)):
batch_data = X_t[i*1000:i*1000+1000]
batch_labels = y_t[i*1000:i*1000+1000]
feed_dict = {tf_data : batch_data, tf_labels : batch_labels, tf_keep:1.0}
l2, predictions,a = sess.run([loss, prediction,tf_accuracy], feed_dict=feed_dict)
loss_array.append(l2)
acc_array.append(a)
f = np.array(np.concatenate([f, np.argmax(predictions, 1) != np.argmax(batch_labels, 1)]))
p = np.array(np.concatenate([p, np.argmax(predictions, 1)]))
print("Accuracy %")
print(np.mean(acc_array))
print("Loss")
print(np.mean(loss_array))
print("")
f = f.astype(np.bool)
print("Overview of failiures")
draw(X_t[f], np.argmax(y_t[f], axis=1))
print("Predictions")
draw(X_t[f], p[f].astype(np.int))
How did you train your model? (Type of optimizer, batch size, epochs, hyperparameters, etc.)
Answer:
I used
These parameters are not rigorously optimized but more a this works approach
What approach did you take in coming up with a solution to this problem?
Answer:
Since it is an image recognition problem I started with a basic CNN (like the mnist one in the tensorflow tutorial on the tensorflow website). Then increased the size and added more dropout but that limited me to 94% or so so i added image jittering.
I also tried the techniques used in the Sermanet Paper but was not successful with those at all.
Take several pictures of traffic signs that you find on the web or around you (at least five), and run them through your classifier on your computer to produce example results. The classifier might not recognize some local signs but it could prove interesting nonetheless.
You may find signnames.csv useful as it contains mappings from the class id (integer) to the actual sign name.
import os
import skimage.data
from skimage.transform import resize
newdata = np.array([resize(skimage.data.imread("./collect-japan/"+name),
output_shape=(32,32,3)) for name in os.listdir("./collect-japan/")])
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.
newdata = equalize_hist(newdata)
top5 = tf.nn.top_k(prediction, 5)
with tf.Session() as sess:
saver = tf.train.Saver()
saver.restore(sess, "./model-final.ckpt")
print("Model restored.")
for i in range(1):
feed_dict = {tf_data : newdata, tf_keep:1.0}
predictions,t5 = sess.run([prediction, top5], feed_dict=feed_dict)
Choose five candidate images of traffic signs and provide them in the report. Are there any particular qualities of the image(s) that might make classification difficult? It would be helpful to plot the images in the notebook.
Answer:
I took a few more... one challenge is that they are Japanese (local) and that some are not in the set (no parking) so perfect results cannot be expected but i want to see what the network says anyway. Lets have a look:
draw(newdata, t5[1][:,0])
Is your model able to perform equally well on captured pictures or a live camera stream when compared to testing on the dataset?
Answer:
Yes it did really well! It can even classify a Japanese Stop even though it has never seen that although that is more luck.
Its not able to classify the no parking correctly. But that because it surprisingly not in the train dataset... looking at the table below it is also quite confident in the wrong prediction. Not good but can only be fixed with more data. It confuses it with Keep right which is indeed somewhat similar and in bad lighting with speed limit and No vechiles (but those only with low certainty.
Other problems are also caused by the sign not being in the train set and therefore it cannot find anything... No trucks is seen as 99% Speed limit (100km/h).
import matplotlib.pyplot as plt
short = [s[:30] for s in df.as_matrix()[:,1]]
for loop in range(3):
figure(figsize=(20,3))
for signnum in range(5):
subplot(151+signnum)
title("Image %d" % (loop*5+signnum+1))
rects1 = plt.bar(np.arange(5),t5[0][loop*5+signnum], 0.5)
s = [s[:30] for s in df.iloc[t5[1][loop*5+signnum].astype(np.int)]['SignName']]
plt.xticks(np.arange(5)+0.25, s, rotation='vertical')
xlim(0,5)
ylim(0,1)
show()
Use the model's softmax probabilities to visualize the certainty of its predictions, tf.nn.top_k could prove helpful here. Which predictions is the model certain of? Uncertain? If the model was incorrect in its initial prediction, does the correct prediction appear in the top k? (k should be 5 at most)
Answer: See above for visualization.
More data in bad lightning might help to improve performance in bad lightning (I also tried preprocessing with equalize_adapthist which improves bad lightning performance but increases sensitivity to noise.). Against unknown signs only training with these signs will help.
If necessary, provide documentation for how an interface was built for your model to load and classify newly-acquired images.
Answer: I only took some pictures with my phone and selected the part with the sign manually. Then I load and resize the images and then they run through the same queue as the dataset.
Note: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.